Search CORE

29 research outputs found

InterTran vs SysTran: Evaluation of Two Online Machine Translation Services

Author: Uneson Marcus
Publication venue
Publication date: 01/01/2005
Field of study

Lund University Publications

Tomorrow's file endings: On archiving principles and archiving formats

Author: Uneson Marcus
Publication venue: Svenskt resurscentrum för vetenskaplig kommunikation (ScieCom)
Publication date: 01/01/2005
Field of study

Lund University Publications

Knowledge-light Letter-to-Sound Conversion for Swedish with FST and TBL

Author: Uneson Marcus
Publication venue: 'Lund University Library'
Publication date: 01/01/2006
Field of study

This paper describes some exploratory attempts to apply a combination of finite state transducers (FST) and transformation-based learning (TBL, Brill 1992) to the problem of letter-to-sound (LTS) conversion for Swedish. Following Bouma (2000) for Dutch, we employ FST for segmentation of the textual input into groups of letters and a first transcription stage; we feed the output of this step into a TBL system. With this setup, we reach 96.2% correctly transcribed segments with rather restricted means (a small set of hand-crafted rules for the FST stage; a set of 12 templates and a training set of 30kw for the TBL stage). Observing that quantity is the major error source and that compound morpheme boundaries can be useful for inferring quantity, we exploratively add good precision-low recall compound splitting based on graphotactic constraints. With this simple-minded method, targeting only a subset of the compounds, performance improves to 96.9%

Lund University Publications

Deductive chart parsing in Haskell

Author: Uneson Marcus
Publication venue: Lund University, Dept. of Linguistics
Publication date: 01/01/2008
Field of study

Lund University Publications

Digitizing Intangible Cultural Heritage

Author: Uneson Marcus
Wittenburg Peter
Publication venue: [Publisher information missing]
Publication date: 01/01/2004
Field of study

As part of the UNESCO project "Establishment of a National Inventory and Electronic Database of Lithuanian Intangible Cultural Heritage" the authors, representing the EU-funded project "European Cultural Heritage Online" (ECHO) were invited to give a course in digital archiving called "Digitizing Intangible Cultural Heritage" in Vilnius, Lithuania, March 15 to 20, 2004. The present report summarizes very briefly the sessions given. Thereafter, the analyses of the state of the digitization work of the participating institutes and recommendations for the future are given in a dedicated, stand-alone section

Lund University Publications

WP2 Report from the ECHO IT Days

Author: Grasshoff Gerd
Strömqvist Sven
Uneson Marcus
Wittenburg Peter
Publication venue: [Publisher information missing]
Publication date: 01/01/2003
Field of study

Abstract not availabl

Lund University Publications

A Multi-lingual Speech Corpus for Cognitive Research

Author: Juel Henrichsen Peter
Uneson Marcus
Publication venue: 2012
Publication date: 10/01/2013
Field of study

We present the speech corpus SMALLWorlds (Spoken Multi-lingual Accounts of Logically Limited Worlds), newly established and still growing. SMALLWorlds contains monologic descriptions of scenes or worlds which are simple enough to be formally describable. The descriptions are instances of content-controlled monologue: semantically “pre-specified” but still bearing most hallmarks of spontaneous speech (hesitations and filled pauses, relaxed syntax, repetitions, self-corrections, incomplete constituents, irrelevant or redundant information, etc.) as well as idiosyncratic speaker traits. In the paper, we discuss the pros and cons of data so elicited. Following that, we present a typical SMALLWorlds task: the description of a simple drawing with differently coloured circles, squares, and triangles, with no hints given as to which description strategy or language style to use. We conclude with an example on how SMALLWorlds may be used: unsupervised lexical learning from phonetic transcription. At the time of writing, SMALLWorlds consists of more than 250 recordings in a wide range of typologically diverse languages from many parts of the world, some unwritten and endangered

OpenArchive@CBS

Acoustic correlates of laryngealization in Kammu

Author: Uneson Marcus
Publication venue: Lund University, Dept. of Linguistics
Publication date: 01/01/2001
Field of study

Abstract not available

Lund University Publications

Om svenskans ortografiska regelbundenhet: med en nordisk utblick

Author: Uneson Marcus
Publication venue
Publication date: 01/01/2013
Field of study

Lund University Publications